Jedha data 'speed dating' project

Purpose : To practice Pandas, Matplotlib and Seaborn libraries on dataset 'speed dating' from Kaggle : https://www.kaggle.com/annavictoria/speed-dating-experiment

We will try to understand which criterias improve the chance to get a date during a speed dating event.

As the purpose of the exercise is mainly to practice data vizualisation python libraries, we will not be exhaustive about all the data we can look at. We will plot relevant graphs and try to find some patterns but we will not invistigate everything or even perform machine learning algorithm (except for some regression included in graphs).

1) Personnal information analysis

In this part we will look at personnal information as declared in the study by attendees themself. The goal is to firstly know better about getting some statistical clues about attendees, how they feel about themselves and their expectations about the event, before looking at the matching results.

Let's firstly look at information given by participants. As the dataset structure is "a line = a meeting between two attendees", we firstly extract information for each 'iid' line :

Now we can group by "iid", wich means we get a dataset with "one line = one attendee", in order to better work on personnal information :

Importance of common religion (mark from 0 to 10)

Attendees were asked if sharing the same religion was an important criterion to them. They had to mark this importance from 0 to 10.

Religion doesn't seem to have a great importance among this population's sample, as 75% of them put a mark lower than 6. Let's look at the 'race' feature.

Importance of common 'race' (mark from 0 to 10)

Same conclusion as religion. Signification on match ?

It firstly appears that a common race makes it a little easier match, but given the confidence intervals we can say that it is not really significative.